In image processing, most image filters and image transformation use convolutions. Convolutions modify the original matrix of pixels through a pointwise multiplication with a kernel or filter matrix. Wikipedia describes convolutions on images as:
Convolution is the process of multiplying each element of the image with its local neighbors, weighted by the kernel. For example, if we have two three-by-three matrices, one a kernel, and the other an image piece, convolution is the process of flipping both the rows and columns of the kernel and then multiplying locationally similar entries and summing. The [2,2] element of the resulting image would be a weighted combination of all the entries of the image matrix, with weights given by the kernel:
Amongst the suite of applications of convolutions, image blurring and sharpening as well as edge detection are the most common. In this demo, we will use MLDB query to efficiently transform images. To do so, we will use the MNIST database of handwriten digits.
In this demo, we will use the jseval function to execute JavaScript code inline with SQL, and the SQL Expression Function to persist and reuse the same JavaScript code.
The notebook cells below use pymldb
's Connection
class to make REST API calls. You can check out the Using pymldb
Tutorial for more details.
In [1]:
from pymldb import Connection
mldb = Connection()
... And other Python librairies
In [2]:
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import display, Latex
from ipywidgets import widgets, interact
A pickled version of the dataset is available on the deeplearning.net website.
The dataset has been unpickled and saved in a public Amazon's S3 cloud storage. Check out MLDB's Protocol Handlers for Files and URLS for more details on loading remote ressources.
In [3]:
data_url_mnist = 'http://public.mldb.ai/datasets/digits_data.csv.gz'
print mldb.put('/v1/procedures/import_digits_mnist', {
"type":"import.text",
"params": {
"dataFileUrl": data_url_mnist,
"outputDataset": "digits_mnist",
"select": "{* EXCLUDING(\"785\")} AS *, \"785\" AS label",
"runOnCreation": True,
}
})
Similarly to the first few steps in the Real-Time Digits Recognizer demo, we will display random MNIST digits from the test set. At each refresh, we get a randomly selected row using the sample
function in a SQL From Expression.
In [4]:
data = mldb.query("""
SELECT * EXCLUDING(label)
FROM sample(
(select * from digits_mnist where rowHash() % 5 = 0),
{rows: 1}
)
""")
image = data.as_matrix().reshape(28, 28)
plt.imshow(image)
plt.gray()
A discrete convolution can be defined mathematically as:
$newPixel[i,j] = \sum_{y=i}^{i+r}\sum_{x=j}^{j+r}oldPixel[x,y] \cdot weight[x,y]$
where the $weight[]$ matrix (see 'kernelDict' dictionary in a couple of cells below) defines the type of image manipulation and $r$ is the area of effect. Imagine a "square box" centered at the pixel that you want to transform. The kernel weighted sum of "old pixels" in the "square box" gives you a "new pixel".
As seen in the code below, each new pixel in the convolved picture is the weighted sum of the the pixel and its neighboring pixels where the weights are the values in the kernel matrix.
Doing convolutions with custom function of type SQL Expression Function and jseval for inline definition of functions using Javascript allows us to process large amounts of data using the optimizations inherent to MLDB. Convolutions are typically very time consuming operations with $O(n\cdot r^2)$ complexity in this case where n is the number of features and r is the radius (i.e. neighboring pixels).
There were two steps to creating the function below:
In [5]:
# JavaScript code loosely based on Ivan Kuckir's blog post: http://blog.ivank.net/fastest-gaussian-blur.html
def create_convolution():
JsConvolutionExpr = """
jseval('
var row_val = val;
var dim = Math.sqrt(row_val.length);
var radius = Math.sqrt(kernel.length);
/*************************************
******** Function Definition *********
**************************************/
// input 1D list, output 1D list, pixel matrix dimensions
function convolution(inList, outList, width, height, radius) {
for (var i = 0; i < height; i++)
for (var j = 0; j < width; j++) {
var newPixel = 0;
var indexW = 0;
for (var yr = i; yr < i + radius; yr++)
for (var xr = j; xr < j + radius; xr++) {
var y = Math.min(height - 1, Math.max(0, yr));
var x = Math.min(width - 1, Math.max(0, xr));
newPixel = newPixel + inList[y * width + x] * weights[indexW];
indexW ++;
}
new_value = newPixel;
outList[i * width + j] = new_value;
}
return outList;
} // End of convolution
//Assuring that the 1d row is in the right order
function arrangeMatrix(inList) {
var length = inList.length;
var data = new Array(length);
for (var i = 0; i < length; i++) {
data[parseInt(inList[i][0][0])] = inList[i][1];
}
return data
}
/*************************************
********** Using Functions ***********
**************************************/
var weights = arrangeMatrix(kernel); // filter matrix
var matrix = arrangeMatrix(row_val); // my picture
var convolvedMatrix = [];
convolution(matrix, convolvedMatrix, dim, dim, radius);
return convolvedMatrix;',
'val, kernel',
valueExpr, kernel
) AS *
"""
print mldb.put("/v1/functions/convolution", {
"type": "sql.expression",
"params": {
"expression": JsConvolutionExpr,
"prepared": True
}
})
create_convolution()
This function will used in the interactive menu in the next section. We will take the image of the digit that we have seen before and apply different filters. You will need to load the cells in this notebook to make it work.
In [6]:
kernelDict = {
'Right Sobel': [-1, 0, 1, -2, 0, 2, -1, 0, 1],
'Detect Edges': [1, 1, 1, 1, -8, 1, 1, 1, 1],
'Sharpen': [0, -1, 0, -1, 5, -1, 0, -1, 0],
'Box Blur': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
'Approximated Gaussian Blur': [0.0625, 0.125, 0.0625, 0.125, 0.25, 0.125, 0.0625, 0.125, 0.0625]
}
In [7]:
def convolutionFunc(image_processing):
SQL_Expr = """
SELECT convolution({
valueExpr: %(data)s,
kernel: %(kernel)s
}) AS *
""" % {
"data": data.values[0].tolist(),
"kernel": kernelDict[image_processing]
}
convolvedData = mldb.query(SQL_Expr)
image = convolvedData.as_matrix().reshape(28, 28)
plt.imshow(image)
In [8]:
options=('Right Sobel', 'Detect Edges', 'Sharpen', 'Box Blur', 'Approximated Gaussian Blur')
interact(convolutionFunc, image_processing=kernelDict.keys(), );
print "Choose an image processing option from the drop-down menu"
I found the 'Detect Edges' convolution particularly useful when training image recognition models. This can be useful in many Machine Vision applications.
Not everyone will want to code their own convolutions from scratch (such as with the create_convolution()
function above). In fact, given the myriad of tools available, it may save you time and effort to use external librairies. MLDB has integrated the TensorFlow Open Source Library for Machine Intelligence allowing us to leverage some of the great Computer Vision APIs and GPU accelaration that it offers. Let's get started with the same images as before.
First, I reshape my image and kernel lists into 4D tensors in the NHWC tensor format. Then, I use the tf_Conv2D
, the TensorFlow operator that is exposed as an MLDB built-in function directly in SQL.
In [9]:
data_ = data.values[0].reshape(1, 28, 28, 1).tolist()
# image input must be a [batch, in_height, in_width, in_channels] shaped tensor
In [10]:
def TensorFlowConvolution(image_processing):
kernel = np.asarray(kernelDict[image_processing]).reshape(3, 3, 1, 1).tolist()
# kernel must be a [filter_height, filter_width, in_channels, out_channels] shaped tensor
strides = [ 1, 1, 1, 1]
SQL_Expr = """
SELECT tf_Conv2D(
{input: %(data)s, filter: %(kernel)s},
{T: { type: 'DT_FLOAT'}, padding: 'SAME', strides: %(strides)s })
AS *
""" % {
"data": data_,
"kernel": kernel,
"strides": strides
}
convolvedData = mldb.query(SQL_Expr)
image = convolvedData.as_matrix().reshape(28, 28)
plt.imshow(image)
In [11]:
options=('Right Sobel', 'Detect Edges', 'Sharpen', 'Box Blur', 'Approximated Gaussian Blur')
interact(TensorFlowConvolution, image_processing=kernelDict.keys(), );
print "Choose an image processing option from the drop-down menu"
Here are a few definitions:
Now you can move on to the Real-Time Digits Recognizer demo where we'll show the machine learning steps to follow to build MLPaint, the real-time digits recognizer plugin.
Otherwise, check out the other Tutorials and Demos.